Establishment of a prognostic model for gastric cancer patients who underwent radical gastrectomy using machine learning: a two-center study

Objective Gastric cancer is a prevalent gastrointestinal malignancy worldwide. In this study, a prognostic model was developed for gastric cancer patients who underwent radical gastrectomy using machine learning, employing advanced computational techniques to investigate postoperative mortality risk factors in such patients. Methods Data of 295 patients with gastric cancer who underwent radical gastrectomy at the Department of General Surgery of Affiliated Hospital of Xuzhou Medical University (Xuzhou, China) between March 2016 and November 2019 were retrospectively analyzed as the training group. Additionally, 109 patients who underwent radical gastrectomy at the Department of General Surgery Affiliated to Jining First People’s Hospital (Jining, China) were included for external validation. Four machine learning models, including logistic regression (LR), decision tree (DT), random forest (RF), and gradient boosting machine (GBM), were utilized. Model performance was assessed by comparing the area under the curve (AUC) for each model. An LR-based nomogram model was constructed to assess patients’ clinical prognosis. Results Lasso regression identified eight associated factors: age, sex, maximum tumor diameter, nerve or vascular invasion, TNM stage, gastrectomy type, lymphocyte count, and carcinoembryonic antigen (CEA) level. The performance of these models was evaluated using the AUC. In the training group, the AUC values were 0.795, 0.759, 0.873, and 0.853 for LR, DT, RF, and GBM, respectively. In the validation group, the AUC values were 0.734, 0.708, 0.746, and 0.707 for LR, DT, RF, and GBM, respectively. The nomogram model, constructed based on LR, demonstrated excellent clinical prognostic evaluation capabilities. Conclusion Machine learning algorithms are robust performance assessment tools for evaluating the prognosis of gastric cancer patients who have undergone radical gastrectomy. The LR-based nomogram model can aid clinicians in making more reliable clinical decisions.


Introduction
Gastric cancer (GC) is believed to be the fifth most common cancer and the third most common cause of death worldwide.Notably,China and Japan are at the forefront, collectively accounting for 75% of Asian cases (1,2).Despite being one of the most common treatment modalities for gastric cancer, surgical intervention alone has failed to elevate the overall 5-year survival rate beyond 50%.Thus, the quest for precise clinical assessments holds paramount clinical importance for the diagnosis and management of affected patients (3).One widely embraced approach in clinical research involves amassing clinical data to construct prognostic models.Within this domain, gastric cancer model studies have proliferated, offering the promise of betterinformed clinical decision-making (4,5).In addition to clinicopathological data, these models incorporate hematologic inflammatory markers and the widely utilized carcinoembryonic antigen (CEA).The association between inflammation and its impact on the occurrence, progression, metastasis, and prognosis of cancer patients, as revealed by blood-based metrics, has become a burgeoning area of research interest (6,7).The principle underlying the utilization of CEA as a serum tumor marker is well-established in clinical practice.This marker finds extensive utility in the early screening of various tumors.Furthermore, its early elevation is recognized as an independent risk factor associated with the poorer prognosis of gastric cancer (8).
Machine learning stands as a precision algorithm within the context of artificial intelligence, uniquely poised to decipher vast and intricate medical datasets.Its capacity to construct clinical prediction models makes it an invaluable tool in the realm of healthcare, offering crucial assistance in diagnosis and prognostication (9).The development of clinical predictive models typically involves the processing and optimization of large datasets within a training set.Subsequently, these models undergo rigorous testing using external validation set data, a pivotal step in establishing their external validity and, by extension, their applicability to diverse patient populations (10,11).Cancer, marked by its complexity and heterogeneity, emerges as a particularly promising frontier for machine learning applications in medical research.The significance of clinical data available empowers early cancer detection, facilitates ongoing monitoring of disease progression, and supports the optimization of treatment strategies (9,12).

Patients' enrollment
This retrospective analysis involved a total of 295 gastric cancer patients who underwent radical gastrectomy at the Department of General Surgery, Affiliated Hospital of Xuzhou Medical University (Xuzhou, China), between March 2016 and November 2019.These patients constituted the training group.Additionally, 109 gastric cancer patients who underwent radical gastrectomy at the Department of General Surgery of Jining First People's Hospital (Jining, China) were included as the verification group.The inclusion criteria were as follows: (1): patients newly diagnosed with gastric cancer, for whom comprehensive medical records were available; (2) cases where primary radical resection of gastric cancer was conducted at the respective hospitals, with subsequent confirmation of gastric adenocarcinoma; (3) absence of any prior anti-tumor therapies, including radiotherapy or chemotherapy, before surgical intervention.The exclusion criteria were as follows: (1) patients with concurrent malignancies; (2) patients presenting preoperative complications of other infectious diseases, blood system disorders, autoimmune conditions, or any other medical conditions that could potentially influence inflammatory markers; (3) cases who had recently received or were currently undergoing anti-inflammatory or immunosuppressive treatments; (4) patients subjected to preoperative blood transfusion therapy; (5) patients with severe liver or kidney dysfunction; (6) cases featuring incomplete clinical data or visitor information.Further details are illustrated in Figure 1.

Outcome measures
The primary outcome event for this study was the survival status of patients at the three-year post-radical gastrectomy.Followup procedures involved telephonic or outpatient monitoring.The survival rate was determined from the date of admission to either the date of decease or the specified deadline for follow-up.

Research purpose
This study concentrated on evaluating the three-year survival outcomes of patients who underwent radical gastrectomy.A total of 404 gastric cancer patients from two medical centers were included in the study.A machine learning algorithm was employed to develop a clinical prediction model aimed at identifying the prognostic risk factors for postoperative patients.The creation of a visual nomogram model, based on these risk factors, can aid healthcare professionals in conducting risk assessments.

Risk factors
Concerning the study subjects, clinical data were collected, including patient's name, age, gender, and clinicopathological information.This included data on blood parameters, tumor location, maximum tumor size, TNM stage, lymph node involvement, nerve vessel invasion, method of gastrectomy, tumor differentiation grade, along with specific blood markers including neutrophil count, monocyte count, lymphocyte count, and CEA level.Peripheral venous blood samples were obtained from fasting cases on the next morning.The collected indices were then incorporated into the Lasso regression model.The Lasso model employs a technique that can shrink the coefficients of unimportant variables to 0, promoting feature selection.Following the establishment of inclusion and exclusion criteria, the relevant data were fed into the Lasso model, enabling the complete elimination of the weight associated with the least important variables.This process allows for data screening and complexity adjustment while fitting the generalized linear model.Consequently, the Lasso model ensures the accuracy of variables in the subsequent development of the machine learning model.

Statistical analysis
Continuous variables were presented as mean ± standard deviation, and categorical variables were expressed as ratio.To create the machine learning and nomogram models, the process was initiated by applying a Lasso regression model to identify the key risk factors linked to the 3-year survival status of patients, as depicted in Figure 1.Subsequently, these relevant risk factors were integrated into machine learning algorithms, leading to the development of logistic regression (LR), decision tree (DT), random forest (RF), and gradient boosting machine (GBM) models.Model performance was assessed by comparing the area under the curve (AUC) of each model.Ultimately, a LR model was selected to construct a nomogram, enhancing the interpretability and visibility of the results.

Feature selection and machine learning performance evaluation
To reduce model complexity and eliminate redundant or irrelevant data in the training group, we applied the Lasso regression model to screen the variables, as illustrated in Figures 2A, B. Besides, 4 machine learning models (LR, DT, RF, and GBM), as illustrated in Figures 3-6 were used in this study.LR is a classification algorithm that seeks to establish a relationship between a feature and the probability of a specific outcome.It possesses the advantage of not presupposing the data distribution and presents results in a probabilistic format, making it appropriate for numerous probability-assisted decision-making tasks.Nonetheless, LR proves ineffective for handling nonlinear data and exhibits heightened sensitivity to imbalances in multicollinearity datasets (13,14).DT is primarily used for classification tasks, and decision trees start from a root node to identify the initial decision point in a dataset and contain features that best divide the dataset into distinct classes.DT is well-suited for handling irrelevant features, offering a model that is easy to understand and explain.They can be visualized and analyzed, facilitating a clear interpretation of the underlying rules.Additionally, DT is effective in dealing with missing data (15).RF, as an extension of the DT method, combines multiple DTs, with the majority vote among the trees determining the final class prediction of the model.RF incurs a substantial training cost, and the decision-making process of the model is susceptible to the specific division of feature values (16,17).GBM is a boosting technique utilized as a numerical optimization algorithm for minimizing loss functions and constructing additive models.It proves effective for small-scale datasets, excelling in the processing of multi-classification tasks and accommodating incremental training.Additionally, GBM demonstrates good inclusiveness for handling missing data.However, its performance diminishes when dealing with high-dimensional feature spaces.The effectiveness of GBM in classification tasks is also reliant on the division of feature attributes, making it more sensitive to the expression form of input data (18, 19).
Model performance was evaluated using various metrics, including accuracy, recall, and the area under the ROC curve, a primary indicator for binary classification performance, ranging from 0 to 1, with higher values signifying superior performance.Additionally, for models with two outcomes, we reported the area under the accuracy-recall curve, which illustrates the trade-off between true accuracy and positive predicted values, as well as the F1 score, defined as the harmonic mean of precision and recall.The models underwent 10-fold cross-validation on the training set and were subsequently tested on the test set, as shown in Tables 1 and 2.

Nomogram
LR was employed to construct a nomogram model for predicting the risk of mortality following radical gastrectomy, utilizing eight variables incorporated into the model.Lines 2 through 9 in the nomogram represent the risk scores associated with individual patients, as shown in Figure 7.The cumulative score serves as an indicator for assessing patients' prognoses, with higher scores signifying an increased risk level and a poorer prognosis.

Patients' baseline characteristics
Patients' baseline characteristics are presented in Table 3.The training group consisted of 295 patients, among whom 93 (73 males and 20 females) passed away within 3 years.The validation group comprised 109 patients, with 25 fatalities (14 males and 11 females).In the training group, variables, such as age, maximum tumor diameter, TNM stage, lymph node metastasis, nerve or vascular invasion, type of gastrectomy, lymphocyte count, and CEA level exhibited statistically significant differences between patients who survived and those who succumbed.Conversely, there were no statistically significant differences in gender, tumor differentiation, tumor site, neutrophil count, and monocyte count.In the validation group, significant differences were found in maximum tumor diameter, TNM stage, lymph node metastasis, and nerve or vascular invasion, while other variables did not exhibit significant differences.

A B
(A) Lasso regression coefficient path diagram.Lasso regression variables were used for dimensionality reduction to further screen the relevant variables.(B) Lasso regression cross validation.Using ten-fold cross-validation, the l value with the smallest cross-validation error is used as the optimal solution of the model.

Discussion
Machine learning employs computer algorithms to identify intricate relationships or patterns within extensive datasets.It accomplishes this by performing numerous operations using preexisting algorithms to recognize and analyze data.Through iterative adjustments to these algorithms, machine learning strives to achieve optimal performance, resulting in the creation of models that establish connections between multiple variables and target variables (20).In essence, supervised machine learning is tasked with identifying associations between input and output data, enabling the prediction of outcomes based on patients' data (21).Machine learning represents a fundamental shift in healthcare, where computers glean insights from patient data without the need for explicit programming of specific tasks.This approach possesses the advantages of enhanced capacity, objectivity, and repeatability when handling large datasets, thereby ensuring data reliability (22,23).It has the potential to enhance the quality of early diagnosis, disease progression monitoring, and the ability to predict patient-specific outcomes in orthopedics, such as prognosis, risk of complications, and implant longevity (24).These advantages promote the sharing of decision-making information between healthcare professionals and patients, facilitating effective planning and rational utilization of healthcare services (25, 26).In addition, the model can be periodically retrained to improve prediction accuracy over time (27).
In the present study, Lasso regression was employed to identify 8 risk factors associated with postoperative mortality in gastric cancer patients.Additionally, we established four machine learning models to assess patient prognosis and created nomograms to evaluate prognosis based on LR.Lasso regression effectively filtered out non-statistically significant variables during the variable screening process, thereby reducing data redundancy and enhancing the model's accuracy and reliability by using fewer variables.This approach to developing clinical models has found applications in various medical domains (28,29).The models' performance was assessed using the ROC curve, with metrics, such as AUC values, sensitivity, specificity, and accuracy.Table 1 illustrates that all four models exhibit commendable accuracy, indicating the robust diagnostic capability of the machine learning models for predicting postoperative prognosis in gastric cancer patients.Table 2 further validates these findings in the verification group, demonstrating the models' strong external applicability.Collectively, these results underscore the effectiveness of machine learning models in accurately reflecting postoperative outcomes in gastric cancer surgery (30,31).
The postoperative prognosis histogram provides an intuitive representation of prognostic risk in gastric cancer patients.Figure 7 illustrates specific scores assigned to variables including age, gender, lymphocyte count, maximum tumor diameter, CEA level, nerve or vascular invasion, TNM stage, and gastrectomy method.In the previous study, Hu used traditional methods to establish clinical models to prove positive LNs, tumor size, adjacent organs invasion, vascular invasion, CA125, the depth of invasion, and HER2 status is the reason that affects radical gastrectomy (32).In the model established by our machine learning algorithm, age and gender are also proved to be the factors that affect the prognosis of radical gastrectomy, which exactly proves that the machine learning algorithm has more powerful computing power.
A nomogram serves as a valuable tool for stratifying the risk of patients, enabling clinicians to assess their conditions effectively.This model assigns scores to various characteristic variables, allowing clinicians to evaluate a patient's status based on these characteristics.Higher scores on the nomogram indicate an increased susceptibility to risk and a less favorable prognosis.Consequently, patients with distinct scores can benefit from tailored treatment strategies, ensuring a more personalized approach to their healthcare.For instance, determining whether to administer chemotherapy to postoperative gastric cancer patients is typically based on clinical recommendations for patients in stage 1b to stage 3.However, the decision regarding when to initiate chemotherapy for stage 1b to stage 3 patients can be informed by the risk score derived from the histogram.Among patients at the same stage, those with higher scores may be advised to pursue additional treatments.This approach effectively stratifies patients based on their individual conditions, facilitating personalized diagnosis and treatment.
The model identified 8 risk factors for postoperative death in gastric cancer patients using Lasso regression.In addition, 4 machine learning models were developed to assess patient prognosis and nomograms were established based on LR to predict patients' outcomes.Lasso regression effectively filtered out irrelevant factors, reducing data redundancy, and enhancing model accuracy and reliability with fewer variables.This approach has been applied in various medical fields.Nomogram.Lines 2 through 9 in the nomogram represent the risk scores associated with individual patients.The cumulative score serves as an indicator for assessing patients' prognoses, with higher scores signifying an increased risk level and a poorer prognosis.

Limitation
There are certain limitations in this study.The retrospective nature of the study may introduce subjective and selective biases, The reliability and validity of the data are limited, and we cannot completely eliminate the possibility of selection bias.Moreover, despite being a two-center study, the sample size remains relatively limited.Further validation with large-scale research is essential to confirm the model's external applicability.

Conclusions
In conclusion, age, gender, lymphocyte count, maximum tumor diameter, CEA level, nerve or vascular invasion, TNM stage, and gastrectomy method could serve as risk factors influencing the postoperative survival of gastric cancer patients.The machine learning model, established through Lasso regression, demonstrated promising performance and reliability.The nomogram model, which is based on the LR model, provides a practical tool for individualized diagnosis and treatment in clinical settings.

FIGURE 1 Flowchart
FIGURE 1Flowchart of patients' selection.

FIGURE 4
FIGURE 4 Performance of the DT model.The AUC, Sen and Spe of the training and internal validation sets were exhibited in figure, respectively.ROC, receiver operating characteristic; AUC, area under the curve; Sen, sensitivity; Spe, specificity.Blue line: Training set.Red line: Validation set.

FIGURE 3
FIGURE 3 Performance of the LR model.The AUC, Sen and Spe of the training and internal validation sets were exhibited in figure, respectively.ROC, receiver operating characteristic; AUC, area under the curve; Sen, sensitivity; Spe, specificity.Blue line: Training set.Red line: Validation set.

FIGURE 6
FIGURE 6 Performance of the GBM model.The AUC, Sen and Spe of the training and internal validation sets were exhibited in figure, respectively.ROC, receiver operating characteristic; AUC, area under the curve; Sen, sensitivity; Spe, specificity.Blue line: Training set.Red line: Validation set.

FIGURE 5
FIGURE 5 Performance of the RF model.The AUC, Sen and Spe of the training and internal validation sets were exhibited in figure, respectively.ROC, receiver operating characteristic; AUC, area under the curve; Sen, sensitivity; Spe, specificity.Blue line: Training set.Red line: Validation set.

TABLE 1
The model performance in the training dataset.